Nonparanormal Distributions & Causal Inference with Single-Cell RNA-Seq Data

نویسنده

  • Elizabeth Silver
چکیده

Background. Single-cell RNA-Seq is a new technique that can measure gene expression levels in individual cells. We would like to use single-cell RNA-seq data to learn genetic regulatory networks. This is a natural task for causal-model structurelearning algorithms, which aim to learn the causal relationships between the measured variables. Causal algorithms perform poorly in high dimensions unless the data are Gaussian, and single-cell RNA-Seq data are non-Gaussian. However, the “nonparanormal SKEPTIC” method extends causal algorithms to high-dimensional Gaussian copula distributions, which may better approximate single-cell RNA-Seq data. Aim. To learn a genetic regulatory network by applying the SKEPTIC to real singlecell gene expression data, validating against known regulatory interactions. Data. 24,175 gene expression levels in 934 mouse embryonic stem cells were measured using inDrop single-cell RNA-seq. 500 high-variance genes, including 120 transcription factors, were selected for network recovery. Method. The covariance matrix over the single-cell RNA-Seq data was estimated using the SKEPTIC, and input to causal algorithms, producing a graph over all measured genes. The performance was evaluated on (a) a set of known transcription factor binding relationships from ChIP-Seq studies, and (b) regulatory effects learned from loss-of-function/gain-of-function experiments. Results. Previous studies did no better than chance at identifying adjacencies for eukaryotic organisms. Applying the SKEPTIC to single-cell data and using FGS for structure learning, we identified adjacencies with 22.5% precision, a 14× improvement over chance (p < 10−45). Conclusion. Single-cell RNA-Seq data may be used for automatic, accurate recovery of the genetic regulatory network. These networks help to organize everything from embryonic development to cancer progression. Thus, these methods can be applied in both developmental genetics and personalized cancer medicine.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

A Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data

Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...

متن کامل

I-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing

Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...

متن کامل

Approximate inference of gene regulatory network models from RNA-Seq time series data

Inference of gene regulatory network structures from RNA-Seq data is challenging due to the nature of the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model for RNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regression with a horseshoe prior to learn a dynamic Bayesian network of in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016